Facing data scarcity using variable feature vector dimension

نویسندگان

  • Pablo Daniel Agüero
  • Antonio Bonafonte
چکیده

This paper focuses on three key points of intonation modelling: interpolation of fundamental frequency contour, sentence by sentence parameter extraction and data scarcity. In some cases, they introduce noise and inconsistency on training data reducing the performance of machine learning techniques. We consider that the F0 contour is segmented into prosodic units (such as accent groups, minor phrases, etc). Each segment of F0 contour has a corresponding feature vector with linguistic and non-linguistic components. We propose to face the limitations mentioned above using a technique based on clustering using different feature vector dimensions. The clustering of feature vectors produces also a partition in the F0 contour space. The proposal consists on a procedure to select the dimension that contributes to predict the best fundamental frequency contour from a RMSE sense compared to a reference contour. Experimental results show an improvement compared to other approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine

We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...

متن کامل

Variable Selection as an Instance-Based Ontology Mapping Strategy

The paper presents a novel instance-based approach to aligning concepts taken from two heterogeneous ontologies populated with text documents. We introduce a concept similarity measure based on the size of the intersection of the sets of variables which are most important for the class separation of the instances in both input ontologies. We suggest a VC dimension variable selection criterion e...

متن کامل

Variable Dimension Vector Quantization of Speech Spectra for Low Rate Vocoders

Optimal vector quantization of variable-dimension vectors in principle is feasible by using a set of fixed dimension VQ codebooks. However, for typical applications, such a multi-codebook approach demands a grossly excessive and impractical storage and computational complexity. Efficient quantization of such variable-dimension spectral shape vectors is the most challenging and difficult encodin...

متن کامل

Applying Genetic Algorithm to EEG Signals for Feature Reduction in Mental Task Classification

Brain-Computer interface systems are a new mode of communication which provides a new path between brain and its surrounding by processing EEG signals measured in different mental states.  Therefore, choosing suitable features is demanded for a good BCI communication. In this regard, one of the points to be considered is feature vector dimensionality. We present a method of feature reduction us...

متن کامل

F0 feature extraction by polynomial regression function for monosyllabic Thai tone recognition

This paper presents a monosyllabic Thai tone recognition system. The system is composed of three main processes, fundamental frequency (F0) extraction from input speech signal, analysis of F0 contour for feature extraction, and classification of each tone using the extracted features. In the F0 feature extraction, the polynomial regression functions are employed to fit the segmented F0 curve wh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008